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On Optimal Frame Conditioners 

Chae Clark and Kasso A. Okoudjou 


Abstract. A (unit norm) frame is scalable if its vectors can be rescaled so 
as to result into a tight frame. Tight frames can be considered optimally 
conditioned because the condition number of their frame operators is unity. 

In this paper we reformulate the scalability problem as a convex optimiza¬ 
tion question. In particular, we present examples of various formulations of 
the problem along with numerical results obtained by using our methods on 
randomly generated frames. 

1. Frames and scalable frames 

1.1. Introduction. A finite frame for is a set 'll = {<fk}kLi C 1^ such 
that there exist positive constants 0 < A < B < oo (referred to as the frame 
bounds) for which 

M 

A\\x\\l <^\{x,<pk )\ 2 < B\\x\\l 

k=\ 

for all x € t N . Given a frame $ = C we denote again by $ the 

N x M matrix whose k th column is the vector <pk. The matrix <f> is the synthesis 
operator associated to the frame $, and its transpose is the analysis operator 
of $. The frame operator is then defined as S = $<I> T . When A = B the frame is 
called tight, in which case the frame operator is S = AI where / denotes the N x N 
identity matrix. 

1.2. Scalable Frames. Scalable frames were introduced in HSU as a method 
to convert a non tight frame into a tight one. More precisely: 

Definition 1.1. Let M > N be given. A frame $ = {< fik}kLi C is scalable 
if there exist a subset $j = {ipk}kej with J C {1,2,..., AI}, and positive scalars 
{xk}kej such that the system = {xktyk}k£j is a tight frame for R N . 

Let <1> = {<fik}kLi C R w be a frame. Then the analysis operator of the scaled 
frame {xk<Pk}kLi is given by A'$ T , where X is the diagonal matrix with the values 
Xk on its diagonal. Hence, the frame $ is scalable if and only if there exists a 
diagonal matrix X = diag^fc), with Xk > 0 such that 

(1.1) S = = $X 2 $ T = AI. 

for some constant A > 0. Without loss of generality we may assume that A = 1 
otherwise replace the diagonal matrix A 2 by Y = X 2 /A. 
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One can covert (O) into a linear system of equations in M unknowns: x\. To 
write out this linear system we need the following function: F : l w —> given by 


F(x) = [i^o (*)j -Pi(*))••• j F n ~i(x)] t , 


~ 2 2" 
X 1 ~ X 2 


%k%k-\- 1 

X 1 " i 3 


%k%k-\- 2 


,Fk{x) = 


2 2 
L-i'i - x n\ 


Xk%N 


and F 0 (x) G R^" 1 , F k (x) G R N ~ k , k = 1,2,..., N - 1, where d := ( jy ~ 1 )( jy + 2 ) . 
Let F(3>) be the d x M matrix given by 

F($) = (F(^) F{<p 2 ) ... F(^m))- 

In this setting we have the following solution to the scalability problem: 

Proposition 1.2. [9] Proposition 3.7] A frame = {<Pk}kLi C R w is scalable 
if and only if there exists a non-negative u G ker i 7 '(<I , )\{0}. 

1.3. Mathematical Programming and Duality. Our main goal is to find 
a non-negative nontrivial vector in the null space of F($>) using some optimization 
methods. For this reason we recall some notions from duality theory in mathemat¬ 
ical programming. For a more robust treatment of duality theory applied to linear 
programs, we refer to the standard texts by S. Boyd, L. Vandenberghe [3] and D. 
Bertsimas, J. Tsitsiklis [2]. Recall that the Primal and Dual linear mathematical 
programming problems are defined, respectively, as follows: 

T 

minimize: c x 
subject to: Ax = b 
x >z 0. 

maximize: b T y 
subject to: A T y < c 
y G R n . 

Theorem 1.3 (Strong Duality). If either the primal or dual problem has a 
finite optimal value, then so does the other. The optimal values coincide, and 
optimal solutions to both the primal and dual problems exist. 

Theorem 1.4 (Complimentary Slackness). Let x* and y* be feasible solutions 
to the primal and dual problems respectively. Let A be an N by M matrix, where 
Aj denotes the j th column and a,; denotes the ith row of A. Then x* and y* are 
optimal solutions to their respective problems if and only if 

yi(a,i ■ x — bi) = 0 for all i = 1,..., N, 

and 

Xi(cj — y T A j) = 0 for all j = 1,..., M. 
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2. Reformulation of the Scalability Problem as an Optimization 

Problem 


This section establishes the equivalence of generating a scaling matrix X and 
solving an optimization problem of a generic convex objective function. More specif¬ 
ically, we shall phrase the scalability problem as a linear and convex programming 
problem. 

First consider the sets Si and <S 2 given by 

Si := {«eR M | F(<f>)u = 0 , u ^ 0 , it ^ 0}, 


and 

S 2 := {v £ R m I F($)v = 0,vh0, |M|i = 1}. 

Si is a subset of the null space of F($), and each it £ Si is associated a scaling 
matrix X u , defined as 


Xu ■ — \^-ij )u — 


if i = 3 
otherwise. 


S 2 C Si PI Bp where Bp is the unit ball under the l l norm. 

We observe that a frame *f> = {(/?fc}fcii C is scalable if and only if there 
exists a scaling matrix X u with u £ Si. Consequently, one can associate to X u a 
scaling matrix X v with v £ S 2 . The normalized set S 2 ensures that the constraints 
in the optimization problems to be presented are convex. 


Theorem 2.1. Let $ = {y>fc}fcLi C be a frame, and let / : R M —> R be a 
convex function. Then the program 


( 2 . 1 ) 


minimize: f(u ) 

Libject to: F($)u = 0 

Nil = 1 


has a solution if and only if the frame $ is scalable. 

PROOF. Any feasible solution u* of V is contained in the set S 2 , which itself is 
contained in Si, and thus corresponds to a scaling matrix X u . 

Conversely, any it £ Si can be mapped to a v £ S 2 by appropriate scaling 
factor. This provides an initial feasible solution to V, and as / is convex and the 
constraints are convex and bounded, there must exist a minimizer of V. □ 


Theorem HOI is very general in that the convex objective function / can be 
chosen so as the resulting frame has some desirable properties. We now consider 
certain interesting examples of objective functions /. These examples can be related 
to the sparsity (or lack thereof) of the desired solution. Using a linear objective 
function promotes sparsity, while barrier objectives promote dense solutions (small 
number of zero elements in it). 


2.1. Linear Program Formulation. Assume that the objective function in 
(EH) is given by /(it) := a T u for some coefficient vector a £ R m \{ 0}. Our program 
V now becomes 
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( 2 . 2 ) 


(V\) minimize: a T u 

subject to: F($)u = 0 

IMIi = i 

u >~ 0. 


Choosing the coefficients a independently of the variables u, results in a linear 
program. For example, the choice = 1 for all i result in a program to minimize 
the i 1 norm of u. Another, more useful choice of coefficients is a,; = Under 

this regime, a higher weight is given to the frame elements with smaller norm (which 
further encourages sparsity). 

One of the advantages of linear programs is that they admit a strong dual 
formulation. To the primal problem V\ corresponds the following dual problem V 2 ■ 


Proposition 2.2. Let <f> = C be a frame. The program 

( V 2 ) maximize: w 


subject to: [F(<h) T 1] 


< a 


is the strong dual of V\. 


Proof. This result follows exactly from the construction of dual formulations 
for linear programs. The primal problem can be formulated as follows: 

M 

minimize: atUt 

2 = 1 

subject to: F(<h)u = 0 

M 

J2 Ui = 1 

2=1 

u >z 0. 


The strong dual of this problem is: 

maximize: w 


subject to: [F(d>) T 1] 


w 


< a. 


□ 


Numerical optimization schemes, in many cases, consist of a search for an initial 
feasible solution, and then a search for an optimal solution. In analyzing the linear 
program formulation we notice that we either have an optimal solution or the 
problem is infeasible, but there is no case when the problem is unbounded (due to 
the bounding constraint ||u||i = 1). 

The dual problem has the property that it either has an optimal solution, or is 
unbounded (from duality). Consequently, for any frame $, w = min{a} and v = 0 
is always a feasible solution to the dual problem. This removes the requirement 
that an initial solution be found [2). 
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2.2. Barrier Formulations. A sparse solution to the linear program pro¬ 
duces a frame in which the frame elements corresponding to the zero coefficients 
are removed. In contrast, one may wish to have a full solution, that is, one may 
want to retain all of, or most of, the frame vectors. To enforce this property, we 
use a barrier objective. 

Proposition 2.3. Let $ = {<Pk}kLi C be a frame, and define 0 < f C 1. 
If the problem 

M 

(2.3) (Vs) maximize: In(u, + e) 

i=1 

subject to: F($)u = 0 
Hi = 1 

u> 0. 

has a feasible solution u* with a finite objective function value, then the frame $ 
is scalable, and the scaling matrix A is a diagonal operator where the elements are 
the square-roots of the feasible solution u*. Moreover, for e = 0, if a solution u* 
exists, all elements of u* are strictly positive. 

Proof. Assume u* is a feasible solution to (12.31) with 0 < e <C 1 and the 
objective function finite. Then from Theorem 12.11 we have that the frame $ is 
scalable. Now assume e = 0. If one of the variables u* were zero, then the objective 
function would have a value of — oo. Since we assume the function is finite, this 
cannot be the case. A negative value for m would result in the objective function 
value being undefined, this also cannot be the case due to the finite objective. 
Therefore, iq must be positive for all i. □ 

An alternative barrier is the maximin objective. 

Proposition 2.4. Let <I> = C be a frame. If the problem 

(2.4) ('Pa) maximize: min { Uj } 

subject to: F(if>)u = 0 

Nil = 1 

u > 0. 

has a feasible solution u* with a finite objective function value, then the frame $ 
is scalable, and the scaling matrix A is a diagonal operator where the elements 
are the square-roots of the feasible solution u*. Moreover, a solution exists with 
positive elements if and only if the solution produced by solving this problem has 
positive elements. 

Proof. To show this, we shall rewrite this problem as a linear program. 

(2.5) maximize: t 

subject to: F($)u = 0 

M 

2=1 

t < Ui 
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t > 0 , U 0. 

Here, t is an auxiliary variable, taken to be the minimum element of u. This linear 
program can be solved to optimality. Moreover, as this problem is convex, the 
optimum achieved is global. If the objective function at optimality has a value of 
0, then there can exist no solution with all positive coefficients. □ 


3. Augmented Lagrangian Method 

To efficiently solve these convex formulations, we employ the method of aug¬ 
mented Lagrangians. Rewriting norms as matrix/vector products we are interested 
in 

minimize: u T Iu 
subject to: F($)it = 0 
l T u = 1 
u >z 0. 


For notational convenience, we denote L and b to be 


*(*)' 

1 T 


and 


0 

1 


respectively. The £ 2 problem is now 

minimize: u T Iu 
subject to: Lu = b 
u y 0. 


To solve this problem, the augmented Lagrangian C is formed, 

C = u T Iu + ( /j, , Lu — b) + — 1 | Lu — 6||| 

= u T Iu + fi T Lu — fi T b + — ( u T L T Lu — 2 u T L T b + b T b). 

This function is minimized through satisfying the first-order condition, V£ = 0. 

The gradient of the Lagrangian with respect to u is solved through standard 
calculus-based methods. The Lagrangian with respect to the dual variables n and 
A is linear, which we optimize through gradient descent. The tuning parameter 
denotes the scaling of the descent direction. 


V u £ = 2 lu + L t fi + \L T Lu — \L T b. 
0 = (2 1 + A L t L)u + L T i^i - \L T b. 
(2/ + A L t L)u = XL T b - L V 
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Dividing the equation by A, we have 



The dual variables have the following gradients, 

V m £ = Lu — b. 

Va C = i(Lu — b,Lu — b). 

And forming the gradient descent algorithm with ? 7 , results in 

Algorithm 1 Gradient Descent w.r.t. // 
while not converged do 

Mfc+i Vk-V ( Lu ~ b ) 

end while 


Algorithm 2 Gradient Descent w.r.t. A 
while not converged do 

Tj 

A fc+ i <- A*, - - • (Lu — b,Lu— b) 

end while 


Lastly, to retain the non-negativity of the solution, we project the current 
solution onto R+. This is accomplished by setting any negative values in the solution 
to 0 (thresholding). We shall denote this V+{-). Forming the full augmented 
Lagrangian scheme, we now have the complete £ 2 derivation. 


Algorithm 3 Full Augmented Lagrangian Scheme (£ 2 ) 
while not converged do 

Wfc +1 t— V+(vk+i) 

Mfc+i «- fik-V (Lu k+ 1 - b) 

TJ 

Afc+i <— Xk — ^ • (Luk +1 — b , Lu k +1 — fr) 

end while 
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4. Numerical Examples 

The following numerical tests are intended to illustrate our methods by scal¬ 
ing frames generated from Gaussian distributions. In particular, throughout this 
section, we identified (random) frames $ with (full rank) random N x M matrices 
whose elements are i.i.d., drawn from a Gaussian distribution with zero mean and 
unit variance. 

The first set of figures are intended to give a representation of how the scaling 
affects frames in R 2 . A number of Gaussian random frames are generated in Mat- 
Lab, and a scaling process is performed by solving one of the optimization problems 
above (the specific program used is noted under the figures). The Gaussian frame 
is first normalized to be on the unit circle. The (blue\circle) vectors correspond 
to the original frame vectors, and the (red\triangle) vectors represent the resulting 
scaled frame. 



Figure 4.1. These examples display the effect of scaling frames 
in R 2 . The frames are sized M = 7 (Left) and M = 30 (Right), 
and were scaled using the Augmented Lagrangian Scheme. The 
left figure shows that scalings favor isolated frame elements. The 
right figure shows that as the frame elements fill the space, the 
scalings become more normalized. 



Figure 4.2. These examples illustrate scalable frames with a 
small number of positive weights. The frames are sized M = 7 
(Left) and M = 30 (Right), and were scaled using linear program¬ 
ming formulation V\ (more specifically, the Simplex algorithm). 
These two example show that for frames of low (Left) and high 
(Right) redundancy, sparse solutions are possible and seemingly 
unrelated to the number of frame elements. 
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Figure 4.3. These frames show full scaling results from the log- 
barrier method. The frames are sized M = 15 (Left) and M = 20 
(Right), and as was mentioned in figure 14.11 the scalings favor 
isolated frame elements. 

The next tables illustrate the sparsity that is achieved in scaling a frame. That 
is, using the linear program, we present the average number of non-zero frame 
elements retained over 100 trials. Our data seem to suggest the existence of a 
minimum number of frame elements required to perform a scaling, and this number 
seems to depend only on the dimension of the underlying space. This phenomenon 
should be compared to the estimates on the probability that a frame of M vectors 
in R n is scalable that were obtained in [7) Theorem 4.9]. 


Sparsity Test Results [Gaussian Frames 


N\M 

3 

4 

5 

10 

20 

30 

40 

50 

2 

3 

3 

3 

3 

3 

3 

3 

3 

3 

- 

- 

- 

6.01 

6 

6 

6 

6 

4 

- 

- 

- 

10.1 

10.12 

10.1 

10 

10 

5 

- 

- 

- 

- 

15.08 

15.12 

15.11 

15.05 


Table 4.1. The average number of frame elements retained after 
scaling using the linear program formulation. Entries with a 
imply that the proportion of scalable frames in the space is too 
small for practical testing. 


Sparsity Test Results [Gaussian Frames 


N\M 

150 

200 

250 

500 

750 

1000 

10 

56.06 

56.02 

55.72 

55.57 

55.66 

55.6 

15 

- 

- 

123.76 

123.98 

123.37 

123.1 

20 

- 

- 

- 

217.6 

218.6 

219.45 

25 

- 

- 

- 

- 

- 

338.67 


Table 4.2. The average number of frame elements retained after 
scaling using the linear program formulation. Entries with a 
imply that the proportion of scalable frames in the space is too 
small for practical testing. 

Observe that the results presented in tables 14.11 and 14.21 show an interesting 
trend. The average number of elements required to scale a frame appears to be 
, , (N-l)(N + 2) | 1 _ N(N + 1) 


2 


2 
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The linear system being solved during the Simplex method has dimensions d+1 x M, 
and attempts to find a non-negative solution in It seems unlikely that this 

solution can be found using less than d+1 frame elements. 

The final test presents the proportion of scalable Gaussian frames of a given 
size over 100 trials. For testing, the number of frame vectors is determined by 
the dimension of the underlying space. For a frame in M. N , the number of frame 
elements used ranges from TV + 1 to 4TV 2 (e.g. for TV = 2, the number of frame 
elements range from M = 3 to M = 16). A Gaussian frame is generated of the 
required sizes and a scaling is attempted. This is performed over a hundred trials, 
and the proportion of frames that were scalable was retained. 

For each TV, a plot of the proportions across frame size M is presented. To 
display these plots in a single figure, the independent variable M is scaled to lie in 
the range (0,1). 



Figure 4.4. Each graph in this figure gives the proportion of scal¬ 
able frames generated after 100 trials. The sizes of the frames range 
from TV + 1 to 4TV 2 . To fit the graphs in a single figure, the range 
of each figure is scaled to be from 0 to 1. The frame dimensions are 
TV = 2 (Blue\Circle), TV = 3 (Red\Star), TV = 4 (Green\Triangle), 

TV = 5 (Cyan\Box), and TV = 10 (Magenta\x). 
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